Improved error reporting #129

airportyh · 2019-08-01T19:55:21Z

This PR aims to improve the error message displayed to the user when a syntax error is encountered by the lexer. In particular:

5 input lines are displayed on error to give users context
Line numbers are displayed
Attaches the errored token to the thrown error object, eventhough it's not a "good" token - this allows the handler (in particular in Nearley), to have access to the errored input and display as it pleases

This PR is related to kach/nearley#459.

Screenshot:

for context and line numbers.

tjvr

I appreciate the desire to improve error reporting in parsers, but I'm not convinced that moo should return extra lines of the input.

tjvr · 2019-08-01T21:14:03Z

moo.js

-    var eol = token.lineBreaks ? token.text.indexOf('\n') : token.text.length
-    var firstLine = this.buffer.substring(start, token.offset + eol)
+
+    var lines = this.buffer.split("\n")


We avoid using split here because we've already tokenized the input. Error handling doesn't necessarily need to be as fast as the happy path, but it could potentially be an issue if the input buffer is large.

We avoid using split here because we've already tokenized the input. Error handling doesn't necessarily need to be as fast as the happy path, but it could potentially be an issue if the input buffer is large.

We could walk the string backwards instead if that is a concern.

Walking backwards (and forwards—see my comment) is a better option. Tokenizers often deal with very large inputs.

tjvr · 2019-08-01T21:18:10Z

Attaches the errored token to the thrown error object, eventhough it's not a "good" token

I'm not convinced about this, either; if you want the error token, can't you define one explicitly?

airportyh · 2019-08-02T01:17:14Z

I appreciate the desire to improve error reporting in parsers, but I'm not convinced that moo should return extra lines of the input.

Could you elaborate? Do you have concerns about effects that the change could cause? Or don't think the issue is important? I'd be happy to work on the code more to make it satisfactory to you, but if you are not interested at all to merge this then I will leave it alone.

airportyh · 2019-08-02T01:28:47Z

Attaches the errored token to the thrown error object, eventhough it's not a "good" token

I'm not convinced about this, either; if you want the error token, can't you define one explicitly?

I wanted to display the character that the lexer errored on within the parser error message. This was not possible without reaching in to the non-public api of the lexer (as specified on the Nearley website) to access its current index and buffer. I felt that passing this token back is slightly better than doing that. Open to other suggestions though.

nathan · 2019-08-05T18:40:28Z

@airportyh

It looks like your terminal emulator is treating the unicode arrow as either fullwidth or proportional width, as many emulators do, so it only sort of lines up with the character it's trying to point to. It's better just to use a caret/circumflex (^) because that's pretty much guaranteed to line up, even if it doesn't look quite as arrow-y.

I find context lines in error messages quite useful, and line numbers marginally so. However, you've implemented context wrong: you should give n lines on both sides of the problematic one, not just n lines before. It's much more useful to see this:

5  function checkPerson(name) {
6    const person = findPerson(name)
7    const result process(person)
                  ^
8    if (result.extremities > 10) return false
9    try {

than this:

3    return lizard
4  }
5  function checkPerson(name) {
6    const person = findPerson(name)
7    const result process(person)
                  ^

(See, e.g., the --unified or --context options of diff or django's debug tracebacks.) The blank line required for the pointer also draws your eye straight to the (in)correct line instead of forcing you to scan to the bottom to find the one that caused the error.

Another formatting nit: there should be at least two columns between line numbers and code: two spaces, a full stop and a space, or something similar.

@tjvr I don't have a problem with making this the default behavior, but if you'd like to be consistent with previous versions and/or keep error messages less bulky, perhaps we could make it an option to formatError.

nathan · 2019-08-05T18:45:46Z

moo.js

+          return pad(token.line - lines.length + i + 1, lastLineDigits) + " " + line;
+      })
+      .join("\n")
+    message += "\n" + pad("", lastLineDigits + token.col) + "⬆︎\n"


Error messages should not end with blank lines unless you have a Really Good Reason™. I'd get rid of the terminal \n.

nathan · 2019-08-05T18:49:02Z

moo.js

+
+    function pad(n, length) {
+      var s = String(n)
+      return Array(length - s.length + 1).join(" ") + s


Even though you never violate this invariant, it makes me uncomfortable (i.e., will trip up future maintainers) that this function will throw if you give it something longer than length, since functions like pad usually don't. Either inline the code (since you made the function local and you only call it once) or make it more robust and put it with the other utility functions at the top.

nathan · 2019-08-05T18:51:29Z

moo.js

-    var eol = token.lineBreaks ? token.text.indexOf('\n') : token.text.length
-    var firstLine = this.buffer.substring(start, token.offset + eol)
+
+    var lines = this.buffer.split("\n")


Walking backwards (and forwards—see my comment) is a better option. Tokenizers often deal with very large inputs.

tjvr · 2019-08-06T22:24:50Z

I agree with all of Nathan's comments. Since he's happy to make this the default, so am I. I think I'd rather have bulkier error messages for everyone than introduce extra arguments to formatError. I think one or two lines before and after the error would be sufficient. :)

As above, I'd rather not attach the token object to the error message, but it sounds like we have to.

airportyh · 2019-08-08T23:05:51Z

Nice! I appreciate the feedback. I plan to address the mentioned issues soon. Here I'll summarize what I am planning to do:

avoid buffer.split and walk the string buffer backwards instead
revert arrow character back to ^
display ~2 lines of input before and after the problematic line
make pad robust and handle cases where input string is longer than length (throw or crop? I can consult other well-known implementations)
reconsider including the "bad" token in the error object. If we don't do this, it's not the end of the world, since the code context output will already make it obvious where the problem is. This feature is a little weird because the attached token is not a valid thing.

If either of you would like to change this list please give me a heads up. I plan to start work some time tomorrow.

nathan · 2019-08-09T13:34:42Z

throw or crop? I can consult other well-known implementations

Neither. Return the original string, like String.prototype.padStart.

airportyh · 2019-08-09T14:08:59Z

throw or crop? I can consult other well-known implementations

Neither. Return the original string, like String.prototype.padStart.

Okay.

Revert arrow character to ^. Error handling in pad function Revert attaching of invalid token to error.

airportyh · 2019-08-09T16:03:39Z

New code is in! Please take a look.

airportyh · 2019-08-19T14:30:33Z

@tjvr @nathan any question on this? Let me know if there's anything I can do to help.

tjvr

Haven't had time to go through it all I'm afraid, but here are a couple of comments!

tjvr · 2019-08-09T20:30:42Z

moo.js

@@ -559,6 +593,8 @@
    }
  }

+  Lexer.prototype.lastNLines = lastNLines


I don't think we want to export this.

That's the only way I could write tests for it. Do you know of another way? Or we could ditch the tests, which I am okay with too.

Removed this and removed tests.

tjvr · 2019-08-09T20:32:09Z

moo.js

@@ -53,6 +53,39 @@
    }
  }

+  function pad(n, length) {
+    var s = String(n)


Why are we calling String() here?

I supposed that superfluous if we simply assume the input is a string. I can get rid of it if you prefer.

Yes please 👍

TheGrandmother · 2020-08-15T15:30:42Z

I would love to see this merged....

      if (err.stack.match(/.*moo\.js.*/)) {
        const m = err.message.match(/.*line (\d+) col (\d+)/)
        const newErr = new CompilerError(err.message)
        err.position = {line: parseInt(m[0]), col: parseInt(m[1])}
        throw newErr
      }

This is the sketchy workaround I had to use to differentiate lexer errors and get the position out. :P

tjvr

Thanks for this!

More input lines displayed on error

ae20c76

for context and line numbers.

airportyh mentioned this pull request Aug 1, 2019

Improve error reporting kach/nearley#459

Merged

tjvr reviewed Aug 1, 2019

View reviewed changes

nathan requested changes Aug 5, 2019

View reviewed changes

Optimized getting few lines of input to display.

5b73140

Revert arrow character to ^. Error handling in pad function Revert attaching of invalid token to error.

tjvr reviewed Aug 19, 2019

View reviewed changes

Addressed code review comments.

4d94b61

nathan self-requested a review August 21, 2019 13:11

tjvr mentioned this pull request Sep 28, 2020

Display context and mark out error tokens #145

Open

tjvr approved these changes May 22, 2022

View reviewed changes

tjvr merged commit 813e006 into no-context:master May 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved error reporting #129

Improved error reporting #129

airportyh commented Aug 1, 2019 •

edited

Loading

tjvr left a comment

tjvr Aug 1, 2019

airportyh Aug 2, 2019

nathan Aug 5, 2019

tjvr commented Aug 1, 2019

airportyh commented Aug 2, 2019 •

edited

Loading

airportyh commented Aug 2, 2019 •

edited

Loading

nathan commented Aug 5, 2019

nathan Aug 5, 2019

nathan Aug 5, 2019

nathan Aug 5, 2019

tjvr commented Aug 6, 2019 •

edited

Loading

airportyh commented Aug 8, 2019 •

edited

Loading

nathan commented Aug 9, 2019

airportyh commented Aug 9, 2019

airportyh commented Aug 9, 2019

airportyh commented Aug 19, 2019

tjvr left a comment

tjvr Aug 9, 2019

airportyh Aug 20, 2019

airportyh Aug 21, 2019

tjvr Aug 9, 2019

airportyh Aug 20, 2019

tjvr Aug 20, 2019

airportyh Aug 21, 2019

TheGrandmother commented Aug 15, 2020

tjvr left a comment

Improved error reporting #129

Improved error reporting #129

Conversation

airportyh commented Aug 1, 2019 • edited Loading

tjvr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tjvr commented Aug 1, 2019

airportyh commented Aug 2, 2019 • edited Loading

airportyh commented Aug 2, 2019 • edited Loading

nathan commented Aug 5, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tjvr commented Aug 6, 2019 • edited Loading

airportyh commented Aug 8, 2019 • edited Loading

nathan commented Aug 9, 2019

airportyh commented Aug 9, 2019

airportyh commented Aug 9, 2019

airportyh commented Aug 19, 2019

tjvr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheGrandmother commented Aug 15, 2020

tjvr left a comment

Choose a reason for hiding this comment

airportyh commented Aug 1, 2019 •

edited

Loading

airportyh commented Aug 2, 2019 •

edited

Loading

airportyh commented Aug 2, 2019 •

edited

Loading

tjvr commented Aug 6, 2019 •

edited

Loading

airportyh commented Aug 8, 2019 •

edited

Loading